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Abstract. The problem of near-optimal distributed path planning to locally sensed targets is 
investigated in the context of large swarms. The proposed algorithm uses only information that 
can be locally queried, and rigorous theoretical results on convergence, robustness, scalability are 
established, and effect of system parameters such as the agent-level communication radius and agent 
velocities on global performance is analyzed. The fundamental philosophy of the proposed approach 
is to percolate local information across the swarm, enabling agents to indirectly access the global 
context. A gradient emerges, reflecting the performance of agents, computed in a distributed manner 
via local information exchange between neighboring agents. It is shown that to follow near-optimal 
routes to a target which can be only sensed locally, and whose location is not known a priori, the 
agents need to simply move towards its "best" neighbor, where the notion of "best" is obtained by 
computing the state-specific language measure of an underlying probabilistic finite state automata. 
The theoretical results are validated in high-fidelity simulation experiments, with excess of 10 4 agents. 
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1. Introduction &: Motivation. Path planning in a co-operative environment 
is a problem of great interest in multi-agent robotics. Recent developments in mi- 
cro machining and MEMs have opened up the possibility of engineering extremely 
small and cheap robotic platforms in large numbers. Limited in size, on-board com- 
putational resources and power, such robots nevertheless can potentially exploit co- 
operation to accomplish complex tasks [U [3l |4] including surveillance, reconnais- 
sance, path finding and collaborative payload conveyance. However, coordinating such 
engineered swarms unveils new challenges not encountered in the operation of one or 
a few robots [5J |7j. Coordination schemes requiring unique identities for each robot, 
explicit routing of point-to-point communication between robots, or centralized rep- 
resentations of the state of an entire swarm are no longer viable. Thus, any approach 
to effectively control swarms must be intrinsically scalable, and must only use infor- 
mation that is locally available. The immediate question for the control theorist is 
whether such algorithms are able to guarantee any level of global performance. This is 
precisely the problem that is investigated in this paper, with an affirmative answer; a 
distributed scalable control algorithm is proposed that allows very large swarms (sim- 
ulation results obtained with 10 4 agents) to self-organize and find near-global-optimal 
routes to locally known targets. The proposed algorithm uses only information that 
can be locally queried, and rigorous theoretical results on convergence, robustness, 
scalability are established, and effect of system parameters such as the agent-level 
communication radius and agent velocities on global performance is analyzed. 

The fundamental philosophy of the proposed approach is to percolate local in- 
formation across the swarm, enabling agents to indirectly access the global context. 
A gradient emerges, reflecting the performance of agents, computed in a distributed 
manner via local information exchange between neighboring agents. It is shown that 
to follow near-optimal routes to a target which can be only sensed locally, and whose 
location is not known a priori, the agents need to simply move towards its "best" 
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neighbor, where the notion of "best" is obtained by computing the state-specific lan- 
guage measure [H] of an underlying probabilistic finite state automata [3]. 

Gradient based method in swarm control are not new [TOj [TTJ [12]. Majority 
of reported work following this direction draw inspiration from swarming phenomena 
observed in nature, where self-organized exploration strategies emerge at the collec- 
tive level as a result of simple rules followed by individual agents. To produce the 
global behavior, individuals interact by using simple and mostly local communica- 
tion protocols. Social insects are a good biological example of organisms collectively 
exploring an unknown environment, and they have served as a source of direct inspira- 
tion for research on self-organized cooperative robotic exploration and path formation 
in groups of robots [131 114j . The standard engineering approach to analyze desired 
global patterns and break them down into a set of simple rules governing individual 
agents is seldom applicable to large populations aspiring to accomplish complex tasks. 
Nevertheless some progress have been made in this direction [15]. 

"We now know that such synchronized group behavior (of flocking 
birds) is mediated through sensory modalities such as vision, sound, 
pressure and odor detection. Individuals tend to maintain a personal 
space by avoiding those too close to themselves; group cohesion results 
from a longer-range attraction to others; and animals often align their 
direction of travel with that of nearby neighbors. These responses can 
account for many of the group structures we see in nature, includ- 
ing insect swarms and the dramatic vortex-like mills formed by some 
species of fish and bat. By adjusting their motion in response to that 
of near neighbors, individuals in groups both generate, and are in- 
fluenced by, their social context there is no centralized controller." 
Collective Minds, D. Couzin [TS] 
On the other hand, the Evolutionary Robotics (ER) methodology [T7] allows for an 
implementation of a top-down approach, where reinforcement learning via evolution- 
ary optimization techniques allows assessment of the systems overall performance, and 
sequentially improve control laws. While such heuristic techniques have been shown 
to yield robust and scalable systems, assuring global performance has remained an 
elusive challenge. The present paper aims to fill this gap by proposing a simple control 
approach with provable guarantees on global performance. The key difference with 
the reported gradient based techniques lies in the formal model that is developed, and 
the associated theoretical results that show that the algorithm achieves near-global 
optimality. 

To the best of the author's knowledge, such an approach has not been previously 
investigated, primarily due to the complexity spike encountered in deriving optimal 
solutions in a decentralized environment. Recent investigations jTSJ HFJ into the so- 
lution complexity of decentralized Markov decision processes have shown that the 
problem is exceptionally hard even for two agents; illustrating a fundamental divide 
between centralized and decentralized control of MDP. In contrast to the centralized 
approach, the decentralized case provably does not admit polynomial-time algorithms. 
Furthermore, assuming EXP = NEXP, the problems require super-exponential time 
to solve in the worst case. Furthermore, since distributed systems with access to 
only local information can be mapped to partially observable MDPs, it follows from 
20J that such problems are non-approximable, negating the possibility of obtaining 
optimal solutions to approximate representations. 

Such negative results do not preclude the possibility of obtaining near-optimal 
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solutions efficiently, when the set of models considered is a strictly smaller subset of 
general MDPs. This is precisely what we achieve in this paper; casting the path plan- 
ning problem as a performance maximization problem for an underlying probabilistic 
finite state automata (PFSA). In spite of similar Markovian assumptions, the PFSA 
model is distinct from the general MDPS (See Section 2.1 1, and admits decentralized 
manipulation, such that the control policy, on convergence, is within an e bound of 
the global optimal. Furthermore, one can freely choose the error bound e (and make 
it as small as one wishes), with the caveat that the convergence time increases (with 
no finite upper bound) with decreasing e. 

The present work is also distinct from Potential Field-based methodologies (PFM) 
widely studied in the centralized single-or-few robot scenarios [23 [22 ■ Early PFM 
implementations had substantial shortcomings [23] suffering from trap situations, in- 
stability in narrow passages etc. Some of these shortcomings have been addressed 
recently, leading to globally convergent potential planners [24 [ 125 ] [26 ] |2"7 [ l28 [ l29l [30] . 
These approaches are computationally hard for single-or-few robots, and thus not 
applicable in the current context. Some variations of the latter approaches have at- 
tempted to reduce the complexity by combining search algorithms and potential fields 
[3"T1 l3"2l l3"3"] , virtual obstacle methods method Ell ES] , sub-goal methods [SHI EH , wall- 
following methods [381 EH EH HQ] etc. Nevertheless, since heuristic strategies only 
based on local environment information are usually applied, many of these methods 
cannot guarantee convergence in general. 

The rest of the paper is organized in six sections. Section [2] briefly summarizes the 
theory of quantitative measures of probabilistic regular languages, and the pertinent 
approaches to centralized performance maximization of PFSA. Section [3] develops the 
PFSA model for a swarm, and Section [4] presents the theoretical development for 
decentralized PFSA optimization, thus solving the problem of computing e-optimal 
routes in a static or frozen swarm. Section [5] extends the results to a dynamic swarm, 
where route optimization and positional updates are carried out simultaneously. Sec- 
tion [6] validates the theoretical development with high fidelity simulation results. The 
paper concludes in Section [7] with recommendations for future work. 

2. Background: Language Measure Theory. This section summarizes the 
concept of signed real measure of probabilistic regular languages, and its application in 
performance optimization of probabilistic finite state automata (PFSA) [S]. A string 
over an alphabet (i.e. a non-empty finite set) E is a finite-length sequence of symbols 
from E [41] . The Kleene closure of E, denoted by E*, is the set of all finite-length 
strings of symbols including the null string e. xy is the concatenation of strings x and 
y, and the null string e is the identity element of the concatenative monoid. 

Definition 1 (PFSA). A PFSA G over an alphabet T, is a sextuple (Q, E, S, II, x, 
c -€), where Q is a set of states. 5 : Q x E* — > Q is the (possibly partial) transition map; 
II : Q x E — > [0, 1] is an output mapping or the probability morph function that specifies 
the state- specific symbol generation probabilities, satisfying Vgi € Q, a € E, Tl(qi, a) ^ 

0, and X^ctge n(<?j, cr) = 1; the state characteristic function \ '■ Q [ — 1 , 1] assigns 
a signed real weight to each state reflecting the immediate pay-off from visiting that 
state, and is the set of controllable transitions that can be disabled ( See Definition^ 
by an imposed control policy. 

Definition 2 (Control Philosophy). If 5(qi,a) = qk, then the disabling of a at 
qi prevents the state transition from qi to q^ . Thus, disabling a transition a at a state 
q replaces the original transition with a self-loop with identical occurrence probability, 

1. e. we now have 8(qi,a) = qi. Transitions that can be so disabled are controllable, 
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and belong to the set ^ '. 

Definition 3. The language L(qi) generated by a PFSA G initialized at the state 
qi G Q is defined as: L(qi) = {s G S* | S(qi,s) G Q} Similarly, for every qj G Q, 
L(qi, qj) denotes the set of all strings that, starting from the state qi, terminate at the 
state qj, i.e., L(g l ,g J ) = {s G S* | <%i,s) = q } G Q} 

Definition 4 (State Transition Matrix). The state transition probability ma- 
trix n G [0, 1]Card(q)xCard(q) j j Qr a gmen pFSA is defined as: Vq,,^- G Q,U %j = 

So-eE s t 8(q- a)=q - n(c, qi) Note thatU is a square non-negative stochastic matrix 14^1 , 
where Hij is the probability of transitioning from qi to qj . 

Notation 1. We use matrix notations interchangeably for the morph function II. 
In particular, %j = life, aj) with q z G Q, <Jj G S. Note that U G [0, 1]Card(Q)xCard(£) 
is not necessarily square, but each row sums up to unity. A signed real measure [IS] 
v % . 2 L (<a) — >. K = (— oo, +oo) is constructed on the cr-algebra 2 L ^ [8], implying that 
every singleton string set {s G Lfe)} is a measurable set. 

Definition 5 (Language Measure). Let u> G L(qi,qj) C 2 L( - q ^ . The signed 
real measure v\ of every singleton string set {u} is defined as: i/g({u}) = 8(1 — 
6 , )'"'n(q , i, uj)x(qj). For every choice of the parameter 9 G (0,1), the signed real mea- 
sure of a sublanguage L(%, q 3 ) C L(qi) is defined as: v % (L(qi, q 3 )) = J2ueL( qi ,q 3 ) ^i 1 ~ 
9)^Il(qi,uj)xj ■ The measure of L(qi), is defined as v % g (L(qi)) = ^2 q g g Vg(Lij). 

Notation 2. For a given PFSA, we interpret the set of measures v l g (L(qi)) as 
a real-valued vector of length Card(Q) and denote i>g(L(qi)) as The language 

measure can be expressed vectorially as (where the inverse exists for 9 G (0, 1] [8]): 

ve = 9[l-(l-9)Tiy 1 x (2.1) 

In the limit of 9 — > + , the language measure of singleton strings can be inter- 
preted to be product of the conditional generation probability of the string, and the 
characteristic weight on the terminating state. Hence, smaller the characteristic, or 
smaller the probability of generating the string, smaller is its measure. Thus, if the 
characteristic values are chosen to represent the control specification, with more posi- 
tive weights given to more desirable states, then the measure represents how good the 
particular string is with respect to the given specification, and the given model. The 
limiting language measure Uo\i = lini«->o+ 9[l — (1 — 0)Il] x\- sums up the limiting 
measures of each string starting from q^ and thus captures how good qi is, based on 
not only its own characteristic, but on how good are the strings generated in future 
from qi. It is thus a quantification of the impact of q\, in a probabilistic sense, on 
future dynamical evolution [S]. 

Definition 6 (Supervisor). A supervisor is a control policy disabling a specific 
subset of the set %f of controllable transitions. Hence there is a bijection between the 
set of all possible supervision policies and the power set 2^ . 

Language measure allows quantitative comparison of different supervision policies. 

Definition 7 (Optimal Supervision Problem). Given G = (Q, E, S, II, x, c &), 
compute a supervisor disabling C s.t. ^(Eiementwise) v q 

C where 

Vq, Uq are the limiting measure vectors of supervised plants G* , G^ under f&* , ffi 
respectively. 

The solution to the optimal supervision problem is obtained in [H] by design- 
ing an optimal policy using vq with 9 G (0,1). To ensure that the computed opti- 
mal policy coincides with the one for 9 — > + , the authors choose a small non-zero 
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value for 9 in each iteration step of the design algorithm. To address numerical 
issues, algorithms reported in [8] computes how small a 9 is actually sufficient to 
ensure that the optimal solution computed with this value of 9 coincides with the 
optimal policies for any smaller value, i.e., computes the critical lower bound 



(This is closely related to the notion of Blackwell optimality; See Section 2.1 1 More- 
over the solution obtained is stationary, efficiently computable, and can be shown to 
be the unique maximally permissive policy among ones with maximal performance. 
Language-measure-theoretic optimization is not a search (and has several key advan- 
tages over Dynamic Programming based approaches. See Section [2~l] for details); it 
is an iterative sequence of combinatorial manipulations, that monotonically improves 
the measures, leading to element-wise maximization of ve (See [8]). It is shown in [8] 
that lim e _j. + 6 [I — (1 — 9)H] \ = where the i th row of & (denoted as p l ) is the 
stationary probability vector for the PFSA initialized at state tfo. In other words, s is 
the Cesaro limit of the stochastic matrix II, satisfying & — lim^oo | YljZo n J [42] . 

Proposition 1 (See [5]). Since the optimization maximizes the language mea- 
sure element-wise for 9 — > + . it follows that for the optimally supervised plant, the 
standard inner product (p\x) * s maximized, irrespective of the starting state qi € Q. 

Notation 3. The optimal 9-dependent measure for a PFSA is denoted as Vq and 
the limiting measure as V*. 



Algorithm 1: Computation of Optimal Supervisor 



input : P, x, ^ 

output: Optimal set of disabled transitions 
l begin 



Set = : 

Set ftM = n 

Set 9 [ ° ] = 0.99, Set k = 1 , Set Terminate = false; 
while (Terminate == false ) do 

#]. 

'* j 



/* Initial disabling set */ 
/* Initial event prob. matrix */ 



18 

19 end 



Compute 

Set n^^^^n^- 1 ]. 

Compute i^[ fc l ; 
for j = 1 to n do 
for i = 1 <o ?i do 

Disable all controllable qi 

Enable all controllable qi 



/* Algorithm 2 */ 



[fc] ^ [fc] 
► qj s.t. Uj <v\' 

, [k] [k] 



Collect all disabled transitions in ; 
if == ^[ fc -!] then 

Terminate = true; 
else 

k = k + 1 ; 

= @W ; 



/* Optimal disabling set */ 



For completeness, the key algorithms are included as Algorithms [T] and [2j 
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Algorithm 2: Computation of the Critical Lower Bound 8+ 

input : P, x 

output: 0* 
l begin 

Set 9* = 1, Set 8 curr = 0; 
Compute & , M , M 1 , M 2 ; 
for j = 1 to n do 
for i = l to n do 

if (^ x ). - (<J» X ) . ^ then 

^rarr — gMj | (^X),: — \ 

else 

for r = to n do 

if (M oX ) 4 ^ (A/ox), then 
| Break; 
else 

if (M M[ X )i (MoM[ X )j then 
I Break; 



if r == then 

I o _ \{{M a -2?)x} t -{(M -S?)x} 3 \ 
I °curr — 8A j 2 

else 

if r > AND r <n then 

I „ |(M Af lX ) i -(A/oM lX ).| 



22 

23 end 



2 r + 3 M 2 



else 



L 



9* = min(0*,0 curr ) 



2.1. Relation Of The Centralized Approach To Dynamic Program- 
ming. In spite of underlying Markovian assumptions, the PFSA model is distinct 
from the standard formalism of (finite state) Controlled Markov Decision Processes 
(CMDP) [331 351 US] . In the latter, control actions are not probabilistic; the associ- 
ated control function maps states to unique actions in a deterministic manner, and 
the control problem is to decide which of the available control actions should be exe- 
cuted in each state. On the other hand, in the PFSA formalism, control is exerted by 
selectively disabling controllable probabilistic state transitions, and is thus a proba- 
bilistic generalization of supervisory control theory [47]. Note that while in the MDP 
framework, one specifies which control action to take at a given state, in the PFSA 
formalism one specifies which of the available control actions are not allowed at the 
current state, and that any of the remaining can be executed in accordance to their 
generation probabilities. Denoting the set of controllable transitions at state g& as 
Sc, and <f) : Q — > £<7 as the control policy mapping the current state qk € Q to the 
controllable move 4>(qk) € (and assuming that the control action is to dictate the 
agent to execute a specific controllable move and is not supervisory in nature), one 
can formulate an analogous optimization problem that admits solution within the DP 
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framework. The transition probabilities for a stationary policy <f> is given by: 

Vqi,Qj € Q,a r £ Y, c ,Prob(q j \q l ,cj)(q i ) = oy) = }^(qi,a) (2.2) 

<r s.t. ( -GE\S c U{o- r })A(«(«i,<r)=9 ; /) 

Immediate rewards, in DP terminology, can be related to the state characteristic x'- 

g(qi,<f>(qi)) = x{<n) (2.3) 

We note that the problem at hand must be solved over an infinite horizon, since 
the total number of transitions {i.e. the path length) is not bounded. Identifying 
(1 — 6*) as the discount factor, the cost-to-go (to be maximized) for the infinite horizon 
discounted cost (DC) problem is given by: 

J(q , 0) = lim E (V(l - e) i g{q i , <b(q t )) ) (2.4) 

\i=0 / 

and for the infinite horizon Average Cost per stage (AC): 

(2.5) 



AC is more appropriate, since there is no reason to "discount" events in future. In 
the PFSA formalism, we solve the analogous discounted problem at sufficiently small 
9, and guarantee that the solution is simultaneously average cost optimal, primarily 
due to the following identity [S]: 

oo JV-1 

lim ey^(l-0) k U k X = lim — V W X = (2-6) 

k=o j=a 

where is the Cesaro limit of the stochastic matrix II. Thus, the proposed technique 
can solve the problem by maximizing 

oo 

v e = lim 6>V(1 -8) k Il k x= lim 9 \l - (1 - 0)nl ~\ (2.7) 

k=0 

i.e., the language measure, and achieve maximization of &x (guaranteeing the prob- 
ability of reaching the goal is maximized, while simultaneously minimizing collision 
probability). In any case, the formulated DP problem can be solved using standard 
solution methodologies such as Value Iteration (VI) or Policy Iteration (PI) jlSl HH] . 
We note that for VI, we need to search for the control action that maximizes the 
value update over all possible control actions, in each iteration. On the other hand, 
PI involves two steps in each iteration: (1) policy evaluation, which is very similar 
to the measure computation step in each iteration for the language-measure-theoretic 
technique and (2) policy improvement, which involves searching for a improved ac- 
tion over possible control actions for each state (which involves at least one product 
between a Card(Q) x Card(Q) matrix of transition probabilities and the current 
cost vector of length Card(Q) per state). The disabling/enabling of controllable 
transitions is significantly simpler compared to the search steps that both VI and PI 
require, and the improvement in complexity (for PI which is closer to the proposed al- 
gorithm in the centralized case, since the latter proceeds via computing a sequence of 
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Downstream 



Agent Loss— »■ 
(Probability Aoi) 




Agent Movement 

(a) 



f/l] Downstream 



-Agent Loss 
(Probability A02) 




(b) Local Network Model 




Agent 
Movement 



(c) Controlled Local Model 

Fig. 3.1: Agent-centric local decision-making with non-zero failure probability 



monotonically improving policies) is by at least an asymptotic factor of 0(Card(Q) 2 ) 
per iteration (for dense matrices, and 0(Card(<5)) per iteration in the sparse case), 
which is significant for large problems. 

Remark 1 . The number of iterations is not expected to be comparable for the 
PFSA framework, VI and PI techniques; simulations indicate the measure-theoretic 
approach converges faster, and detailed investigations in this direction is a topic of 
future work. 

Another key advantage of the PFSA-based solution methodology is guaranteed 
Blackwell optimality [SDJ 01] . It is well recognized that the average cost criterion is 
underselective, namely the finite time behavior is completely ignored. The condition 
of Blackwell optimality attempts to correct this by demanding the computed controller 
be optimal for a continuous range of discount factors in the interval (do, 1), i.e. for 
E (0, 1 — d ), where < d < 1. Since the PFSA-based approach maximizes the 
language measure for some 9 = Q m im such that the optimal policy is guaranteed 
to be identical for all values of 9 in the range [9 m i n ,Q], the solution satisfies the 
Blackwell condition. It is possible to obtain such Blackwell optimal policies within 
the DP framework as well, but the approach(es) are significantly more involved (See 
[H] , Chapter 8) . The ability to adapt 9 at each iteration (See Algorithm [2]) leads 
to a novel adaptive discounting scheme in the technique proposed, which solves the 
infinite horizon problem efficiently while using a non-zero 9 ^ 9 m i n at all iteration 
steps. 

3. The Swarm Model. We consider ad-hoc mobile network of communicating 
agents endowed with limited computational resources. For simplicity of exposition, 
we develop the theoretical results under the assumption of a single target, or goal. 
This is not a serious restriction and can be easily relaxed. The location and identity 
of the target is not known a priori to the individual agents, only ones which are within 
the communication radius of the target can sense its presence. The communication 
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radii are assumed to be constant throughout. Inter-agent communication links are 
assumed to be perfect, which again can be generalized easily, within our framework. 
We assume agents can efficiently gather the following information: 

1. (Set of Neighboring agents:) Number and unique id. of agents to which it can 
successfully send data via a 1-hop direct line-of-sight link. The communica- 
tion radius R c is assumed to be identical for each agent. The set of neighbors 
for each agent varies with time as the swarm evolves. 

2. (Local Navigation Properties:) Navigation is assumed to occur by moving 
towards a chosen neighbor with constant velocity, the magnitude of which is 
assumed to be identical for each agent. In general, there a non-zero probabil- 
ity of agent failure in the course of execution this maneuver, which is assumed 
to be either known or learnable by the agents. However the explicit learning 
of these local costs is not addressed in this paper. 

The local network model, along with the decision-making philosophy, is illustrated in 



Figure 3.1 We will talk about a frozen swarm, which denotes a particular spatial 
configuration of the agents assumed to be fixed in time. Unless explicitly mentioned, 
the agents are assumed to be updating their positions in continuous time (moving 
with constant velocity) , while changing their headings in discrete time as dictated via 
on-board decision-making based on locally available information, with the objective 
of reaching the target with minimum end-to-end probability of agent failure. We fur- 
ther assume that the failure probabilities mentioned above are functions of the agent 
locations, and possibly vary in a smooth non-increasing manner with increasing inter- 
agent distances. However no time-dependence is assumed, i.e. the failure probabilities 
remain constant for a frozen swarm, and change (due to the positional updates) for a 
mobile one. The agent velocities are assumed to be significantly slower compared to 
the time required for the convergence of the optimization algorithm for each frozen 
configuration. The implications of the last assumption will be discussed in the sequel. 
First we formalize a failure-prone ad-hoc network of frozen communicating agents as 
a probabilistic finite state automata. 

Definition 8 (Neighbor Map For A Frozen Swarm) . If Q is the set of all agents 
in the network, then the neighbor map Af : Q —> 2® specifies, for each agent qi £ Q, 
the set of agents 7V(<Zi) C Q (excluding qi) to which qi can communicate via a single 
hop direct link. 

Definition 9 (Failure Probability). The failure probability Xij € [0, 1] is defined 
to be the probability of unrecoverable loss of agent qi in the course of moving towards 
agent qj . 

Thus, Xij reflects local or immediate navigation costs, and estimated risks and 
therefore varies with the positional coordinates of the agents qt and qj. These quan- 
tities are not constrained to be symmetric in general, i.e., Xij ^ Xji. We assume the 
agent-based estimation of these ratios to converge fast enough, in the scenario where 
such parameters are learned on-line. Since we are more concerned with decision op- 
timization in this paper, we ignore the parameter estimation problem of learning the 
failure probabilities, which is at least intuitively justified by the existence of separated 
policies in large classes of similar problems. 

We visualize the local network around a agent go m a manner illustrated in Fig- 
ure |3.1[ a) (shown for two neighbors q\ and 52)- In particular, agent q$ attempting to 
move towards the current position of q\ experiences a failure probability A01, while 
the moving towards (72 has a failure probability A02 ■ To correctly represent this infor- 
mation, we require the notion of virtual states (g l oi>9 l 02 m Figure |3T[b)). 
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6-agent Frozen Network 



^ Controllable) 
Corresponding PFSA model with virtual states 



( Virtual State) 



Fig. 3.2: 6-agent network & 23 state PFSA (16 virtual states, 6 agents, 1 dump state) 



Remark 2 (Necessity Of Virtual States). The virtual states are required to model 
the physical situation within the PFSA framework, in which transitions do not emerge 



from other transitions. As illustrated in Figure 3.1(a), the failure events do actually 
occur in the course of the attempted maneuver; hence necessitating the notion of the 
virtual states. 

Definition 10 (Virtual State). Given a agent qi, and a neighbor qj 6 N{qi) 
with a specified failure probability X^, any attempted move towards q.j is assumed to 
be first routed to a virtual state q^, upon which there is either an automatic (i.e. 
uncontrollable) forwarding to qj with probability 1 — Xij, or a failure with probability 
Xij . The set of all virtual states in a network of Q agents is denoted by Q v in the 
sequel. Hence, the total number of virtual states is given by: 

Card(Q") = £ A%) (3.1) 

i:qi£Q 

And the cardinality of the set of virtual states satisfies: 

^ Card(Q") ^ Card(Q) 2 - Card(Q) (3.2) 

We assume that there is a static agent at the target or the goal, which we denote 
as <;tgt- The local communication with this agent-at-target can be visualized as the 
process of sensing the target by the mobile agents. We are ready to model an ad-hoc 
communicating network of frozen agents as a PFSA, whose states correspond to either 
agents, the virtual states, or the state reflecting agent failures. 

Definition 11 (PFSA Model of Frozen Network). For a given set of agents 
Q, the function M : Q — > 2® , the link specific failure probabilities Xij for any 
agent q^ and a neighbor qj G M(qi), and a specified target <7tgt € Q, the PFSA 
<Gn = [Q N i II, x, ^) is defined to be a model of the network, where (denoting 
CARD(A/"(gi)) = m): 

o States: Q n = Q [j Q v [j {q D } (3.3a) 

where Q v is the set of virtual states, and qu is a dump state which models loss of 
agent due to failure. 



o Alphabet: E = [J [J a ij \ \_J { a D} (3.3b) 
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where <7jj denotes navigation (attempted or actual) an denotes agent failure. 



o Transition Map: 



S(q,a) = < 



Qii 
i.i 

<ID 
ID 



o Probability Morph Matrix: 



o Characteristic Weights: 



n(g,<7) 



ifq = qi,a = cr t 
V 1 — Qiji" ~ " »J 

if 9 = Qij,o- = o- D 

ifq = <iD,<r = o-d 

— undefined otherwise 

^ ifq = q h o- = a ij 

1-Xij >f «/ . <7 n, , 

\j if '? = Oij,o- = o- D 

1 if q = Qd,o- = o-D 

otherwise 



(3.3c) 



Xi = 



if Qi = <?Tgt 
otherwise 



o Controllable Transitions: ^ Q> 1] ^ A^(<7i); qi q t j £ 



(3.3d) 

(3.3e) 
(3.3f) 



We note that for a network of Q agents, the PFSA model may have (almost always 
a signifies 

Card(Q) 



has, see Figure 3.2 1 a significantly larger number of states. Using Eq. (3.2) 

1 ^ CARD(g 7V ) ^ Card(Q) 2 



1 



(3.4) 



This state-explosion will not be a problem for the distributed approach developed in 
the sequel, since we use the complete model only for the purpose of deriving theo- 
retical guarantees. Note, that Dcfinition|TT1generates a PFSA model which can be op- 
timized in a straightforward manner using the language-measure-theoretic technique 
described in Section[2](See [5]) for details). This would yield the optimal routing policy 
in terms of the disabling decisions at each agent that minimize source-to-target failure 
probabilities (from every agent in the network). To see this explicitly, note that the 

measure-theoretic approach elementwise maximizes limgi_^.o+ (1— 0)11] x = &X-> 
where the i th row of & (denoted as p 1 ) is the stationary probability vector for the 
PFSA initialized at state qi (See Proposition [T]). Since, the dump state has charac- 
teristic — 1, the target has characteristic 1, and all other agents have characteristic 
0, it follows that this optimization maximizes the quantity Px 0T — Pdump' f° r everv 
source state or agent qi in the network. Note that p^ GT , Pd ump are the stationary 
probabilities of reaching the target and incurring an agent loss to dump respectively, 
from a given source qi. Thus, maximizing Pxgt ~ Pdump f° r every qi £ Q guarantees 
that the computed routing policy is indeed optimal in the stated sense. However, the 
procedure in 8 requires centralized computations, which is precisely what we wish 
to avoid. The key technical contribution in this paper is to develop a distributed 
approach to language-measure-theoretic PFSA optimization. In effect, the theoret- 
ical development in the next section allows us to carry out the language-measure- 
theoretic optimization of a given PFSA, in situations where we do not have access 
to the complete II matrix, or the \ vector at any particular agent (i.e. each agent 
has a limited local view of the network), and are restricted to communicate only with 
immediate neighbors. We are interested in not just computing the measure vector in 
a distributed manner, but optimizing the PFSA via selected disabling of controllable 
transitions (See Section [2]). This is accomplished by Algorithm [3] 

3.1. Control Approach For Mobile Agents. For the mobile network, <Gn(£) 
varies as a function of operation time t. For a particular instant t = t$, the globally 
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optimized model G£j(io) yields the local decisions for the agent maneuvers. As stated 
before, this global optimization can be carried out in a distributed manner, and the 
agents update their headings towards the neighbor which has the highest measure 
among all neighbors, provided it is in fact higher than the self-measure. Transition 
towards any neighbor with a better or equal measure compared to self via randomized 
choice is also acceptable, but we use the former approach for the theoretical develop- 
ment in the sequel. The movement however modifies the PFSA model to G^(t + At), 
and a re-optimization is required. We assume, as stated before, that the agent veloc- 
ities are slow enough so that they do not interfere with this computation. A crucial 
point is the time complexity of convergence of the distributed algorithm, which in 
our case, is small enough to allow this procedure to be carried out efficiently. Also, 
note that since the complete model is never assembled, the modifications to Gn is 
also a local affair, e.g. updating the set of neighbors or the failure probabilities, and 
such local effects are felt by the remote agent via percolated information involving an 
unavoidable delay, which goes to ensure that the effect of all local changes are not felt 
simultaneously across the network. 

3.2. Possibility Of Different Local Models. The local model, as described 



above and illustrated in Figure 3.1 assumes that errors are non-recoverable; hence 
the possibility of transitioning to the dump state, from which no outward transition 
is defined. Alternatively, we could eliminate the dump state, and simply add the 
transition as a self-loop, or even redistribute the probability among the remaining 
transitions defined at a state. It is intuitively clear that the adopted model avoids 
errors most aggressively. 

4. Decentralized PFSA Optimization For The Frozen Swarm. In the 

sequel, the current measure value, for a given 0, at agent qi <G Q is denoted as v 6 \i, 
and the measure of the virtual state q^ e Q N is denoted as v g \^ q vy The parenthesized 

entry (q^) denotes the index of the virtual state g"- in the state set Q . Similarly, 
the transition probability from qi to is denoted as IL^yy The subscript entry 

i{qYj) denotes the ik th element of II, where k = (qjj). 

Algorithm [3] establishes a distributed, asynchronous procedure achieving: 

Vqt&QMi gl ° bal > »*e\i (4-1) 

convergence 

where Vg\i is the optimal measure for qi £ Q that would be obtained by optimizing 
the PFSA Gn, for a given 8, in a centralized approach (See Section |2j. The optimal 
routing policy can then be obtained by moving towards neighboring agents which have 
a better or equal current measure value. If more than a one such neighbor is available, 
then one either chooses the local destination agent randomly, in an equiprobable 
manner; or as we use in this paper, move towards one chosen from the set of neighbors 
with maximal measure. In the sequel we show that this forwarding policy converges 
to the globally optimal routing policy, that, for a sufficiently small 6, it maximizes 
probability of reaching the target, while simultaneously minimizing the probability of 
end-to-end failures. Furthermore, choosing randomly between qualifying neighboring 
agents leads to significant congestion resilience. These issues would be elaborated in 
the sequel (Proposition [7| . 

Algorithm [3] has four distinct parts, marked as (al), (a2), (a3) and (a4). Part 
(al) involves inter-agent communication, to enable a particular agent qi € Q to as- 
certain the current measure values of neighboring agents, and the failure probabilities 
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Algorithm 3: Distributed Update of Agent Measures In Frozen Swarm 

input : G N = (Q, E, S, 5, \, «0j 6 
begin 



Initialize Vtfo 6 Q, u„\i 







/* Begin Infinite Asynchronous Loop */ 
while true do 

for each agent qi £ Q do 
if A/"(<2i) ^ then 
m = CARD(A/"(gi)) 
for each agent qj £ Af(qi) do 

/* (al) Inter-agent Communication */ 
Query Vg\j &i Failure Prob. A;j 

/* (a2) Control Adaptation */ 
if Ug\j < uJi then 

: 0; /* Disable */ 



else 



-fr 

n i(9g) 



if n 



then 



n i(9,p = m 

Itii = Elii — — I ^ Enable */ 
endif 
endif 

/* (a3) Updating Virtual States */ 
= i 1 ~ ^i 1 ~ X ij)"eb 

endfor 
endif 

/* (a4) Updating Agent */ 

+{l-6)U ii v 9 \ i + 6x\i 



endfor 
endw 



end 



Xij on respective links. Recall, that we assume the probabilities Ay to be more or 
less constant for the frozen swarm; however agents need to estimate these values for 
generalization to the mobile case. Part (a2) is the control adaptation, in which the 
agents decide, based on local information, the set of allowable destination agents. 
Part (a3) is the computation of the updated measure values for the virtual states q\j 
where j : qj € Af(qi)- Finally, part (a4) updates the measure of the agent based on 
the computed current measures of the virtual states. We note that Algorithm [3] only 
uses information that is either available locally or that which can be queried from 
neighboring agents. 

Proposition 2 (Convergence). For a network Q modeled as Gn = (Q N ,H,d,TL, 
X,^), the distributed procedure in Algorithm^ has the following properties: 

1. Computed measure values for every agent qi G Q are non-negative and bounded 
above by 1, i.e., 

V® G Q N yt G [0,oo), G [0,1] (4.2) 

2. For constant failure probabilities and constant neighbor map Af : Q — > 2® , 
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Algorithm^ converges in the sense: 

V qi G Q N , lim V t e \ i = v?'\i G [0,1] (4.3) 



t— >oo 



5. Convergent measure values coincide with the optimal values computed by the 
centralized approach: 

Vqi€Q N ,vr\i = "e\i (4-4) 

Proof. (Statement 1:) Non- negativity of the measure values is obvious. For 
establishing the upper bound, we use induction on computation time t. We note that 
all the measure values v\\i are initialized to at time t = 0. The first agent to change 
its measure will be the target, which is updated at some time t = t Q : 



-to 



lfe ST )=O + 0X ((?TGT )=0 (4.5) 



where the first term is zero since all agents still have measure zero and the target 
characteristic X(g T o T ) = Thus, there exists a non-trivial time instant to, at which: 

(Induction Basis) Vg* G Q N ,Pg\i ^ 1 (4.6) 

Next we assume for time t = t' , we have 

(Induction Hypothesis) \/q t G Q N , Vr ^ t' ,V%\i ^ 1 

We consider the next updates for physical agents and virtual states separately, and 
denote the time instant for the next updates as t' + . Note, that t' + actually may be 
different for different agents (asynchronous operation). 

(Virtual States) For any virtual state qi = gjy G Q N , where qk, qj G Q, we have: 

(Physical Agents) For any qi G Q, where set of enabled neighbors E n = {qj G 
Af(q t ) s.t. p0 + | (<z v) ^ Pg'l*}: 



1 

Card(A%)) 



-j:qj£E n j.j&M(gi)\E n 

which establishes Statement 1. 

(Statement 2:) We claim that for each agent G Q^, the sequence of measures v\\i 
forms a monotonically non-decreasing sequence as a function of the computation time 
t. Again, we use induction on computation time. Considering the time instant to 
(See Eqn. (4.5)), we note that we have an instant up to which all measure values have 
indeed changed in a non-decreasing fashion, since the measure of ^tgt increased to 
9, while other agents are still at 0; which establishes the basis. For our hypothesis, 
we assume that there exists some time instant t' > fn, such that all measure values 
have undergone non-decreasing updates up to t' . We consider the physical agent 
qi G Q which is the first one to update next, say at the instant t' + > t'. Referring to 
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Algorithm [3j this update occurs by first updating the set of virtual states {q^ : qj € 
A/"(<7j)}. Since virtual states update as: 

= (1-0)(1-A ii )f#| j (4-8) 
it follows from the induction hypothesis that 

^l(^)^'k-) ( 4 - 9 ) 

If the connectivity (i.e. the forwarding decisions) for the physical agent qi remains 
unchanged for the instants t' and t' + , and since the measures of any neighboring agent 
has not decreased (by induction hypothesis), then: 

Vf\i^t\i (4.10) 

If, on the other hand, the set of disabled transitions for changes (e.g. for some 
qj € Af(qi), qi q%j was disabled at t! and is enabled at t' + , or vice verse), the 

(1 — 0) 

measure of agent qi is increased by the additive factor QAtu^WTg 7 )) 

which completes the inductive process and establishes our claim that the measure 
values form a non-decreasing sequence for each agent as a function of the computation 
time. Since, a non-decreasing bounded sequence in a complete space must converge 
to a unique limit the convergence: 

Vft G Q N , lim v% = itfli E [0, 1] (4.11) 

t— >oc 

follows from the existence of the upper bound established in Statement 1. This 
establishes Statement 2. 

( Statement 3:) From the update equations in Algorithm [3j we note that the limiting 
measure values satisfy: 

=9[l-(l-d)n]- 1 X (4.12) 
which implies that measure values does indeed converge to the measure vector com- 



puted in a centralized fashion (See Eq. (2.1)). Noting that any further disabling (or 
re-enabling) would not increase the measure values computed by Algorithm |3j we 
conclude that this must be the optimal disabling set that would be obtained by the 
centralized language-measure theoretic optimization of PFSA Gn (Section [2]). This 
completes the proof. □ 

Proposition 3 (Initialization Independence). For a network Q modeled as a 
PFSA <G N = (Q N ,Y,,S,n,x,'^), convergence of Algorithm^ is independent of the 
initialization of the measure values, i.e., ifv l e denotes the measure vector at time t 
with arbitrary initialization a G [0 j 1] Card ( ( 3 ) ; then: 



lim S* = lim vl (4.13) 

t— ^oo ' t— ^oo 



where Vg a = a and Vg = [0 • • • 



T 
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Proof. The measure update equations in Algorithm [3] dictate that the measure 
values will have a positive contribution from a. Denoting the contribution of a to the 
measure of agent qi G Q at time t as C^(<7i), we note that the measure can be written 
as a = Ca(Qi) + fh where fj is independent of a. Furthermore, the linearity of the 
updates imply that C^,(<Zi) can be used to formulate an inductive argument as follows. 
We use kl G N U {0} to denote the minimum number of updates that every agent in 
the network has encountered up to time instant t G [0,oo). We claim that: 

V ?i G Q,Vi G [0,oo),C*(cfc) ^ (l-0) fc *|H|! (4.14) 

To establish this claim, we use induction on k\. For the basis, we note that there 
exists a time instant t®, such that Vr ^ t , kl = 0, implying that 

VT^t ,C T a (q l )=a l ^(l-8)° ^' = (1-0)^ Nil 

q 3 eQ 

We assume that if at some tk, kl k — k e N, then: 

(Induction Hypothesis) Vtft G Q,Cai.9i) = i 1 ~ °) k \\ a \\i 
Next let qi be an arbitrary physical agent, and consider the first update of q t at 

^ I, - E C 1 - ^(fl&j^k^) + (! - e ) n «^ I* + 

j: 9i eAA( ?i ) 

^C? fe) ^ ^(1 - 0)n i(3&) (l - A«)(l - 0)(1 - <9) fe ||«j] a 

+ (i-0)n«(i-0) fc IM|i + 0x< 

=*.C?(©)^(l-«) fc+1 ||a||i 

We note that if kl k+1 = k + 1, then every agent qi E Q must have undergone one more 
update since tk implying: 

Vft G Q,C^ +I (ft) S (1 - 0) fc+1 |Mli (4.15) 

which completes the induction proving Eq. ( 4.14[ ). Observing that limt_ i . 00 kl — oo, 
and ||a||i < oo, we conclude: 

Vft G Q, lim C*(tft) = (4.16) 

t— >oo 



which immediately implies Eq. (4.131. □ 

Next we investigate the performance of the proposed approach, and establish 
guarantees on global performance achieved via local decisions dictated by Algorithm[3j 
We need some technical lemmas, and the notion of strongly absorbing graphs, and 
graph powers. 

Definition 12 (Exact Power of Graph). For a given graph G = (V, E), the exact 
power G , {or d G N, is a graph (V,E'), such that (qi,qj) is an edge in G d , only if 
there exists a sequence of edges of length exactly d from agent qi to agent qj in G. 

Definition 13 (Strongly Absorbing Graph). A finite directed graph G = (V,E) 
(V is the set of agents and E C V x V the set of edges) is defined to be strongly 
absorbing (SA), if: 
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1. There are one or more absorbing agents, i.e., 3A C V , s.t. every agent in A 
(non-empty) is absorbing. 

2. There exists at least one sequence of edges from any agent to one of the 
absorbing agents in A. 

3. If E d denotes the set of edges for the d th exact power of G, then, for distinct 
agents qi,qj G V , 

{ qi , qj ) eE^Vdem, (<&, qi ) £ E d (4.17) 

Lemma 1 (Properties of SA Graphs). Given a SA graph G — (V, E), with A^V 
the absorbing set: 

1. The power graph G d is SA for every d G N. 

2. q £ A =► 3q' G V \ {q} s.t. (q 1 , q) £ E 

3. 3d G N (Vg G V \ A (3q' G A ((q, q') G E d ))) 

Proof. Statement 1 is immediate from Definition |13| Statement 2 follows imme- 
diately from noting: 

qi A => 3c/ G V \ {q} S.t. (q, q') G E (q',q) $ E 

Statement 3 follows, since from each agent there is a path (length bounded by 
Card(V)) to a absorbing state. □ 

The performance of such control policies, and particularly the convergence time- 
complexity is closely related to the spectral gap of the induced Markov Chains. Hence 
we need to compute lower bounds on the spectral gap of the chains arising in the 
context of the proposed optimization, which (as we shall see later) have the strongly 
absorbing property. The following result computes such a bound as a simple function 
of the non-unity diagonal entries of II. 

PROPOSITION 4 (Spectral Bound). Given a n-state PFSA G = (Q, S, 5,11) with 
a strongly absorbing graph, the magnitude of non-unity eigenvalues of the transition 
matrix II is bounded above by the maximum non-unity diagonal entry of II. 

Proof. Without loss of generality, we assume that G has a single absorbing state 
(distinct absorbing states can be merged without affecting non- unity eigenvalues). 
Now, \x is an eigenvalue of II iff \i d is an eigenvalue of Tl d , d G N. From Lemma [l] 

CI 31 6 N s.t. IF" has no zero entry in column corresponding to the absorbing 
state. Let d+ be the smallest such integer. 

C2 Every non-absorbing state has at least one zero element in the corresponding 
column of H d * . 

C3 Statements C1,C2 are true for any integer d ^ d*. 
We denote the column of ones as e, i.e., e = [1 ■ • • 1] T Since U d is (row) stochastic, 
we have iFe = e. Hence, if v is a left eigenvector for H d with eigenvalue then: 

vll d e = ve = f i d ve =>{1- fi d )ve = (4.18) 

implying that if ^ 1, then ve = 0. Now we construct C = [Ci---C„], where 
Cj = miiij Hfj (minimum column element). Considering M — H d — eC, we note: 

{vll d = fi d v) A (/ + 1) => vM = fi d v (4.19) 

Recalling that stationary probability vectors (Perron vectors) of stochastic matrices 
add up to unity, we have: 



(vU d = v) =*>• vM = v - veC = v - C 



(4.20) 
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which, along with the fact that since C is not a column of all zeros, implies that an 
upper bound on the magnitudes of the eigenvalues of M provides an upper bound on 
the magnitude of non-unity eigenvalues for H d . Now, invoking the Gerschgorin Circle 
Theorem [5TJ[S2], we get: 

l/l ^ i - = i - c a h ^ (i - c a y (4.2i) 

3 

where C a is the minimum column element corresponding to the absorbing state. 1 — C a 
is the maximum probability of not reaching the absorbing state after d steps from any 
state, which is bounded above by (a) dl {b) d ~ dl where a is the maximum non-diagonal 
entry in II not going to the absorbing state, b is the maximum of the non-unity 
diagonal entries in n, and d± is a bounded integer. Since any sequence of non-selfloops 
is absorbed in a finite number of steps (strongly absorbing property), we have a finite 
bound for d±. Hence we have: 

\fi\ ^ lim a^b 1 -^- = b = max Tl u (4.22) 

d— >oo qi:Tln<l 

This completes the proof. □ 

Next, we make rigorous our notion of policy performance, and near-global or 
e-optimality. 

Definition 14 (Policy Performance & e-Optimality). The performance vector 
p s of a given routing policy S is the vector of agent-specific probabilities of a packet 
eventually reaching the target. A policy U has Utopian performance if its performance 
vector (denoted as p u ) element-wise dominates the one for any arbitrary policy S, i.e. 
Vqi G Q N , pf ^ pf . A policy P has e-optimal performance, if for e > 0, we have: 

H^-p^lL^e (4.23) 



For a chosen 9, the limiting policy Pe computed by Algorithm[3]results in element- 
wise maximization of the measure vector over all possible supervision policies (where 
supervision is to be understood in the sense of the defined control philosophy). Dg° 
is related to the policy performance vector p Pe as follows. Selective disabling of the 
transitions dictated by the policy Pg induces a controlled PFSA, which represents the 
optimally supervised network, for a given 9. Let the optimized transition matrix be 
ilg, and its Cesaro limit be ^g. (Note: Ilg, £Pg are stochastic matrices.) Then: 

V qi eQ N ,^ e x\ UqTCT) =Pi e (4-24) 

We would need to distinguish between the optimal measure vector v^f (optimal for a 
given 9 = 9') computed by Algorithm 3j and the one obtained by first computing Vg? 
and then using the PFSA structure obtained in the process to compute the measure 
vector for some other value of 9 = 9" . These two vectors may not be identical. 

Notation 4. In the sequel, we denote the vector obtained in the latter case as 
^(8> e") implying that we have ^ = 

Lemma 2. We have the following equalities: 

" ~Te>fi)=P Pe ' (4.25a) 



0^0+ 
0->O+ 



lim, 9^ e) = p u (4.25b) 
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Proof. Recalling Eq. (4.24), and noting that for any PFSA with transition matrix 
II (with Cesaro limit we have lim e ^ + t>e = Ume_ J ,o+ — (1 — 9)U] \ = 



we have Eq. (4.25a). In general, different choices of 9 result in different disabling 
decisions, and hence different policies. However, since there is at most a finite number 
of distinct policies for a finite network, there must exist a 9* such that for all choices 
< 9 ^ 9+, the policy remains unaltered (although the measure values may differ). 
Since, executing the optimization with vanishingly small 9 yields a performance vector 
identical (in the limit) with the optimal measure vector element-wise dominating 
the one for any arbitrary policy, the policy obtained for < 9 ^ 9+ has Utopian 
performance. Hence: 

J% ^> = A ^ = pP " = pU (4 - 26) 

This completes the proof. □ 

Computation of the critical 9+ is non-trivial from a distributed perspective, al- 
though centralized approaches have been reported [5]. Thus it is hard to guarantee 
Utopian performance in Algorithm [3] Also, 0* may be too small resulting in an unac- 
ceptably poor convergence rate. Nevertheless, we will show that, given any e > 0, one 
can choose 9 to guarantee e-optimal performance of the limiting policy in the sense 
of Definition [l4j We would need the following result. 

Lemma 3. Given any PFSA, with transition matrix H and corresponding Cesaro 
limit , and fj, being a non-unity eigenvalue of II with maximal magnitude, we have: 

||0[l-(l-0)n]- 1 -^||^ I -^-- (4.27a) 

\\u m - lim ^')L ^ J^TT ( 4 ' 27b ) 
" e'->o+ ' "°° 1 — \fj,\ 

Proof. Denoting M = [l - (1 - 0)11] -1 - \&>, 

OO OO 

M =[I - (1 - 9)n}- 1 - - 9) k = - 0) fc (n - 2?f - 9> 

= [I- (1 -6)(U- ^>)}- 1 - & 

We note, that if it is a left eigenvector of n with unity eigenvalue, then = u. Also, 
if the eigenvalue corresponding to u is strictly within the unit circle, then u2? = 0. 
After a little algebra, it follows that if u is the left eigenspace (denoted as E(l)) 
corresponding to unity eigenvalues of n, then uM = 0, otherwise, uM = 
where /i is a non-unity eigenvalue for II. Invoking the definition of induced matrix 
norms, and noting H-AHoo = ||A T ||i for any square matrix A: 

1 1 Ml loo = max ||mM||i= max \\uM\U (4.28) 

IHIi=l Hull^lAu^Cl)" 

We further note that since [I — (1 — 8) (II — is guaranteed to be invertible jS], 

its eigenvectors form a basis, implying: 

u = X)jCjV, with Ej c i^ =1 ( 4 - 29 ) 
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where u J are eig envectors of [I — (1 — 0)(II - ^>)}~ x with non-unity eigenvalues, and 
Cj are complex coefficients. An upper bound for can be now computed as: 



< 



< 



where /i is a non-unity eigenvalue for II with maximal magnitude. This establishes 

)[i-(i-e)n]- 1 - &>) x 



Eq. (4.27a). Finally, noting: 



lim V, 



e'->o+ 



(fl,fl') 



establishes Eq. ( [427b] ). □ 

The next proposition the key result relating a specific choice of 9 to guaranteed 
e-optimal performance. 

Proposition 5 (Global e-Optimality) . Given anye > 0, choosing 9 — e/ m 2 where m 
max gS Q CART)(j\f(q)) guarantees that the limiting policy computed by Algorithm^ is 



e-optimal in the sense of Definition 14 



Proof. We observe that limiting measure values v^\i = vt\i computed by Algo- 
rithm [3] can be represented by convergent sums of the form (a^-: non-negative reals): 



Vq> e Q N , ur\i = J2 aij (i 



(4.30) 



implying that for each % € Q, v^gg^li (See Notation 4| is a monotonically decreasing 
function of 9\ in the domain [0, 9\. We note that if the following statement: 



v°°l ■ 



V0i S 0, n 



is true, then we have Utopian performance for policy Pe, i.e., p p " = p u . Hence, if 
p p » ^ p u , then we must have: 

39 2 < e,i qi , qj e Q N , > A (V$, ei) \i > V$, Bl )\i) 

upon which Eq. (4.30), along with the bound established in Eq. ( 4.27a[ ), guarantees 
that if qi, qj are agents (in consecutive order) that satisfy the above statement, then: 



lim (vfSf) Jj 



i7°° I ■ 
v (e,e 1 )\3 



< 



do 



(4.31) 



where fie = ijj/J » w ^ rl A* being a maximal non-unity eigenvalue of the transition 
matrix of the PFSA computed by Algorithm [3] at 9. Next we claim: 



We (0,6], H^-^Hoo^m 2 



(4.32) 



We observe that, for any 9', the optimal policy Pe> can be obtained by beginning 
with the PFSA induced by Pe (which is the optimal policy at 9), and then executing 
the centralized iterative approach [5], resulting in a sequence of element-wise non- 
decreasing measure vectors converging to the optimal v^: 



(4.33) 



Distributed Self-Organization Of Swarms To Find e-Optimal Routes 21 

where v g k ) is the vector obtained after the k th iteration, and k* < oo is the number of 
required iterations. Since, v\ k ) = 6' [I — (1 — x, where the transition matrix 

after k th iteration is Il[ fe ] and setting = vf) — i^g g,\ we have: 

A^ 1 = (i - e')[i - (i - e')nW]~\iiW - n^)9^ 9l) 
= ^{e'[i (i - e')^]- 1 }^ - n™)^} 

to fc ] \k] 

For qi e Q, let U^ 0- ^' be the set of transitions (qi gj), which are updated (i.e. 
enabled if disabled or vice verse) to go from the configuration corresponding to vf) 

\k] 

to the one corresponding to v e , . We note that: 

u (o^) = ^(o^i) nU (o^)^ ( j 5r 

where W = \}f^ k) \ (vf^ 1] n uf^* ) . The i th row of II W is obtained from II ^ [8] 



by disabling controllable transitions qi —t q,j if v g a ) \j > vf) \i (and enabling otherwise), 
and each such update leads to a positive contribution in the corresponding row of Jg) . 
It follows that updating any transition t = qj) € ^Ul' ^ 1 ' n V^^^^j leads to 

a positive contribution to given by: 

C t ^U{q u a)\ V [ X-^\\ (4.34) 

Every t' = (q, — > qk) £ "W causes a negative contribution to wj^li, given by: 

C t , = -fL( qi ,a')\vf)\.-vf)\ k \ (4.35) 

implying that: uf)^ ^ C r (4.36) 

r6(u{°- ,1) nui°" > * ) ) 

=> uf) \ t ^ ^ fife, a)/3 e f? = M (See Eq. jOl) ) 

Since the rows corresponding to the absorbing states have no controllable transitions, 
absorbing states must remain absorbing through out the iterative sequence, and the 
corresponding entries in Jf) for all k £ {0, • • • , k*} are strictly 0. It follows: 

[fc] I _ / , if qi is absorbing , . 

^ U ~ \ E [0,p e 6] .otherwise 1 '> 

Stochasticity of B^ implies that in the limit 6' — >• + , Bg, converges to the Cesaro 



limit of M g k ). Applying Lemma [3J 



||b£ ] - Urn bL* ] || < — -. : = Pe>B' (4.38) 

Ik] 

where [igi is a non-unity eigenvalue for B e , with maximal magnitude. Using the 

\k] 

invariance of the absorbing state set, and observing that the Cesaro limit limg/_ >0 + B e , 
has strictly zero columns corresponding to non-absorbing states, we conclude: 

W e (0,6], A [ e % ^ ^LfoO'foe S Pe>PeO 
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It is easy to see that the PFSA induced by Pq is strongly absorbing (Definition 13), 
and so is each one obtained in the iteration. Also, the virtual states in our network 
model have no controllable transitions, and have no self-loops. Physical agents can 
have self-loops arising from disablings; but for a non-absorbing agent with at most 
m neighbors, the self-loop probability is bounded by (m — l)/m, which then implies 



< 



l-{m-X)/r, 



(Proposition jij . Hence: 



We (0,0], HA^IU ^m 2 6 



(4.39) 



Thus, if we choose 9 = e/m 2 , we can argue: 



Vfee {(),••• ,**}, We (0,6], HA^IU^e 



• lim 

e'-vo+ 



lim PS 



'{9,9') 



— lim 

0'->O+ 



< 



'{9,9') 



< 



Continuity 
of norm 



S e (Using Lemma [2| 



which completes the proof. □ 

Once we have guaranteed convergence to a e-optimal policy, we need to compute 
asymptotic bounds on the time-complexity of route convergence, i.e., how long it 
takes to converge to the limiting policy so that the local routing decisions no longer 
fluctuate. In practice, the convergence time is dependent on the network delays, the 
degree to which the agent updates are synchronized etc., and is difficult to estimate. 
In this paper, we neglect such effects to obtain an asymptotic estimate in the perfect 
situation. This allows us to quantify the dependence of the convergence time on 
key parameters such as N, m and e. Future work will address situations where 
such possibly implementation-dependent effects are explicitly considered resulting in 
potentially smaller convergence rates. 

Proposition 6 (Asymptotic Runtime Complexity). With no communication 
delays and assuming synchronized updates, convergence time T c to e-optimal operation 
for a network of N physical agents and maximum m neighbors, satisfies: 



T c = 



Nm 2 



,e(l-7*), 

where 7* is a lower bound on failure probabilities 



Proof. Synchronized updates imply that we can assume the following recursion: 

vf ] = (Zero vector) (4.40a) 

d l e k+1] = (1 - e)ii^df ] + e x (4.40t>) 

which can be used to obtain the upper bound: 

11^-^11^(1-0)* (4.41) 

implying that after k updates, each agent is within (1 — 6) k of its limiting value. 
Denoting the smallest difference of measures as A*, we note that (1 — 6) k ^ A* would 
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guarantee that no further route fluctuation occurs, and the network operation will 
be e-optimal from that point onwards. To estimate A + , we note that 1) comparisons 
cannot be made for values closer than the machine precision Mo, and 2) the lowest 
possible non-zero measure in the network occurs at the network boundaries if we 
assume the worst case scenario in which the failure probability is always 7*. We 
recall the measure of a agent is the sum of the measures of all paths initiating from 
the particular agent and terminating at the target. Also, note that any such path 
accumulates a multiplicative factor of (1 — 0) 2 (1 — 7*) in each hop. In the worst case 
a given agent is TV hops away, and has a single path to the target, implying that 
the smallest non-zero measure of any agent is bounded below by ((1 — 9) 2 (1 — J*)) N ■ 
Hence: 

A^M Q ((l-0) 2 (l-^)) N (4.42) 
and hence a sufficient condition for convergence is: 

(1 - 9) k = M ((1 - 9) 2 {1 - ^)) N => (1 - fl)( fc - 2W > = M (l - 7*)* 

log(l - 9) log(l - 9) 

Treating Mq as a constant, we have ^"^^^ = O Q). Since 9 must be small for 
near-optimal operation and considering the worst case 7* <C 1, we have: 

=Ki - hO) =s 1 - 7* => he = 7* =>• h = 



1 „ ( 1 



9(l-(l-^))-^ kl °Ul-7*) 



1 N \ / N 

>k = 0\ N+- + —^- r)=0' 



e(i-7*)y W-7*) 

/ jVm 2 \ 

=^>fc = O — r] (Using Proposition [5l 

V e ( 1 -7*)/ LJ 

Thus we have T c = O(k), which completes the proof. □ 

It follows from Proposition [6] that for constant e and 7*, and large networks 
with relatively smaller number of local neighbors such that N 2S> m, we will have 
T c = O(N). Detailed simulation, on the other hand, indicates that this bound is not 
tight, as illustrated in Figure \6. la\ where we see a logarithmic dependence instead. 
The stationary policy computed for the frozen swarm has some additional properties, 
as we establish next. 

Proposition 7 (Properties) . The limiting frozen policy is stationary and has 
the following additional properties: 

1. is loop- free 

2. is the unique loop-free policy that disables the smallest set of transitions among 
all policies which induce the same measure vector for a given 9. 

Proof. Stationarity is obvious. (1) Absence of loops follows immediately from 
noting that, in the limiting policy, a controllable transition — > QiLj\ is enabled if 
and only if Quj\ has a limiting measure strictly greater than that of qt, implying that 
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Table 4.1: Instantaneous Agent Data Table 



Id. 


Neighbor # 


Current 
Measure 


Failure 
Probability 


Forwarding 
Decision 


h 


(Self) 1 




do = 

















m 






1 



any sequence of transitions (with no consecutive repeating states) goes to either the 
dump or the target in a finite number of steps. 

(2) follows directly from the uniqueness and the maximal pcrmissivity property 
of optimal policies computed by language measure-theoretic optimization (See [!]).□ 

One can easily tabulate the data that needs to be maintained at each agent (See 



Table 4.1). In particular, each agent needs to know the unique network id. of each 
neighbor that it can communicate with (Col. 1), and their current measure values 
(Col. 3). The failure probabilities for communicating from self to each of those 
neighbors must be maintained as well, for the purpose of carrying out the distributed 
updates (Col. 4). The forwarding decision is a neighbor-specific Boolean value (Col. 
5), which is set to 1 if the neighbor currently has a strictly higher measure than 
self, and otherwise. The packets are then forwarded by randomly choosing (in 
an equiprobablc manner) between the enabled neighbors, i.e., the ones with a true 
forwarding decision. Note that this agent data updates when the measures of the 
neighbors change (Col. 3), or the failure probabilities (Col. 4) update. However, 
changes in the measures may not necessarily reflect a change in the forwarding deci- 
sions. Also, note that the routing is inherently probabilistic, (due to the possibility 
that multiple enabled neighbors may exist for a given agent). Furthermore, the opti- 
mal policy disables navigation decisions to as few neighbors as possible for a specified 
9 (Proposition [7]), and hence exploits available alternate routes in an optimal manner, 
thereby reducing congestion. 

5. Simultaneous Navigation & Decision Optimization: The "Unfrozen" 
Case. Notation 5 (Best Neighbors). For a fixed agent qi, the set of neighboring 
agents having maximal measure at operation time t is denoted as B^ (t) . Furthermore, 
let b*(t) denote a randomly chosen maximal agent, towards which has decided to 
move at time t. 

Notation 6 (Swarm Configuration). Denote the vector of positional coordinates 
of the agents at time t as V{t). 

Definition 15 (Movement Mechanism). The positional update mechanism of 
the swarm can now be concretely stated as: 

CI After step (a4) in Algorithm^ for each qi choose a maximal agent b*(t), and 
move towards b*(t) at a constant velocity, with the following restriction. 

C2 // there exists qj such that qi — b*(t), then make sure that the distance from 
qj is within the communication radius. 

Definition 16 (Process Si Vs [t, V(t'))). The stated movement mechanism induces 
a sequence of swarm configurations as a function of time t denoted by & V3 (t,V(t')), 
which is understood to be the achieved vector of positional coordinates of the agents 
as a function of time t Si t' , beginning with the initial configuration V{t!) at time t' , 
with the constant update velocity v s > 0. 
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Definition 17 (The Ideal Update Process). Assuming that the distributed route 
convergence for the frozen swarm occurs instantaneously, let ,JV^ Vs (t, V(t')) denote 
the vector of position coordinates of the agents at time t ^ t' , obtained as a result 
of the following sequential operation, initiated with the swarm configuration V(t') at 
time t' : 

1. Freeze swarm 

2. Optimize routes via Algorithm^ (assumed to occur instantaneously for this 
definition only) 

3. For time A', move each agent qi towards its best neighbor (with some form 
of tie-breaking if required) with a constant velocity v s . 

4- Go to step 1. 

Then the ideal update process is defined as the sequence of swarm configurations ( as 
a function of the operation time t ) given by: 

J? Va (t,V{t')) = lim ^a> s (<,W) (5-1) 

A'^0+ 

y Ve (t,V(i/)) has the following immediate properties: 

1. At any point in time t^.t', the routing policy in effect is globally e-optimal 
in the sense defined in the preceding section. 

2. Infinitesimal updates at each time t occur according to such e-optimal policies. 

Denoting J? Vs (t,V(t'))\i and M Vs (t,V(t'))\i as the positional coordinates of the agent 
qi at time t for the respective update processes, and Vtgt as the positional coordinate 
of the target, we have the following convergence results. 

Proposition 8 (Convergence Of Swarm Trajectories). For any initial configu- 
ration V(0) at time t = 0, each agent eventually converges to the target, i.e., 
Mr, cifnwn^l VV(0),Vq i eQ,lim t ^ oo \\j? Vs (t,V(0))\ i -P T GT\\=0 

qi WW* >u ^\ vp(o),v 9i g q.Whx, \\M Vs (t,V(o))\i -PtgtII = o 

Proof. We consider the two processes <sK\',v s {t,V{0)), for some A' > and 



&A',v a (t,V(0)). We note that condition C2 in Definition 15 is automatically satisfied 
for jV&i Vs (t, V(0)), since the distance between agent q^ and b*(t) is guaranteed to 
be non-decreasing if the agents move with a constant velocity v s . This immediately 
implies that no agent in the swarm gets disconnected, and it follows that: 

9 9 \ l (0)>0^yt>0,Ve\ l (t)>0 (5.2) 

since it is given that \/qi G Q, 2?e|i(0) > implying that at least one sequence of hops 
from agent qi to qtct of the form qi',-- - , 5tgt} exists at time t = 0, such that 

v e \i S vo\i> ^ • ■ ■ ^ voWgt = 1 (5.3) 

and hence at least one such sequence is guaranteed to exist for all t > 0. Let 
h t (qi) G N be the minimum length of such a sequence from qi at time t. Since it is 
given that Mqi G Q,^e|i(0) > 0, we have: 

maxh (qi) £ Card(Q) (5.4) 

q.eQ 

We note that all agents qi with ho(qi) = 1 are direct neighbors of the target, and 
hence simply move towards the latter at a constant velocity v s for all times until 
convergence. Let t' be the time within which all such agents do converge to the 
target. Then, it follows that: 

maxh t '(qi) ^ max/iofe) — 1 (5-5) 

qi€Q q.EQ 

By continually applying the above argument, we obtain a sequence of times {t', t" , ■ ■ ■ }, 
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such that: 

Card(Q) ^ max /io (%) > max (<&) > max. ht>>(qi) > • • • (5-6) 

<7;GQ qi£Q 9i£Q 

Finite size of the swarm, and the fact that the above argument applies for all A' > 0, 
then implies the desired result. □ 

^v s (t,V(0)) cannot be directly implemented in practice (due to the requirement 
of sequential freezing and instantaneous route optimization). However, it allows us 
to compute the performance of implementable policies by comparing how close the 
achieved sequence of swarm configurations is to the ideal process. Note the in spite 
of convergence, the ideal process J f Vg (t, "P(O)) differs significantly in definition from 
3% Vs (i, 'P(O)), and we need to establish that the latter is in some meaningful sense 
close to the former. We need the following definition, and a notion of convergence 
rate. 

Definition 18 (Path Lengths To Target). Recall that v e \ t (t) > implies that at 
least one sequence of hops from agent qi to (7tgt of the form {qi, qy , • ■ • , Qtgt} exists 
at time t, such that 

v e \i{t) ^ P e \i'(f) S ■ • • S Mtgt = 1 (5.7) 
and h t (qi) is the minimum length of such a hop sequence from qi at time t. We define 
h t (qi) as the physical piecewise length of such a path (denoted by the indices of the 
agent sequence for simplified notation i.e. writing qi as i) 

S = {i = ji,j2, ■ ■ ■ ,jr-i,jr, ■ ■ ■ ,ih t ( qi ) = Tgt} (5.8) 

as follows: 

3r=3h t { qi ) 

ht{li)= E ll^Wir-l-PWirll (5-9) 
jr=h 

Proposition 9 (Convergence Rate). The swarm trajectories converge to the 
target at an exponential rate for both ^ Vs (t,V(0)) and M Vs (t, V(0)), i.e., we have: 

Vg< G QMli) ^ ho(Qi)e~ (Vs/Rc)t (5-10) 
where R c > is the specified constant communication radius. 

Proof. We note that in Eq. (5.9), agent qj r G Mj r _ 1 (t). Without loss of generality, 
we assume that qj r — b* r i (t). Then it follows that: 

^t(ft) = -h t (qi)v s (5-11) 



Next, we note the bound: 



htfa) £ RMdi) -htbi) ^ -i^(ft) (5-12) 

tin 



Using in Eq. (5.11), we obtain: 

d 



Hn) S - I ) M*) (5-13) 



which completes the proof. □ 

Definition 19 (Swarm Diameter). The swarm diameter D t (Q) is defined as: 
D t (Q) = 2max||P g< (t) -Ptgt|| (5.14) 

qi£Q 

Corollary 1 (Corollary To Proposition [9} . D t (Q) ^ 2B (Q)e-^ v ^ R ^ t 
Proof. Follows immediately from noting Dt(Q) ^ 2max 9i£ Q h t (qi). □ 
Corollary 2 (Corollary To Proposition^]). Denoting an upper bound on the 
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time required for all agents to converge to the target as T conv , we have: 

v s T CO nv — Const, for constant R c (5.15a) 
R c /T conv ~ Const, for constant v s (5.15b) 



Proof. Replace the requirement of every agent converging to the target by one 
that requires almost all of them reaching the target, in the sense that the swarm 
diameter Dt co „„ — /, where < / -C 1. Using Corollary [T] we have: 

v s T conv /Rc ^ ln(2//) (5.16) 
Since T conv is an upper bound on the convergence time, we can replace the inequality: 

v s T conv /R c ~ ln(2//) = Const. (5.17) 
which implies the desired result. □ 

Notation 7. We denote the set of unit vectors of velocity directions at time 
t for the processes S Vs (t,P(Q)) and M Ve (t,V{Q)) as dJ? Va (t,V{Q)) and d& Va (t,P(Q)) 
respectively. 

Proposition 10 (Asymptotic Deviation From Ideal Process). For sufficiently 
small w, > 0, we have: 



Prob 



8M Vs (t, V(0)) - dJ Va (t, m Va (t, V{0))) 



> 



= o 



R, 



1 e 



-{v s /R a )t 



(5.18) 
(5.19) 



where T c is the convergence time of the frozen swarm. 

Proof. We note that if the neighborhood maps and the failure probabilities do not 
change for the interval [t — T c ,t], then the velocity vectors of the two processes will 
coincide. We assume that v s is small enough such that in the absence of topology up- 
dates, the velocity vectors for & Vs (t,V(0)) coincide with that of J* Vs (t, S% Vs (t, V(0))). 
Denoting the probability of topology update at time t as T(t), we note: 

rt 



Prob 



8M Va (t, T(0))- dJ Vs (t,& v . (t, V(0))) 



>0 = 



T{t')dt' (5.20) 



t-T c 



T(t) however is bounded above by the fraction of agents ip{t) at time t that have an 
inter-agent distance ~ R Cl which is a necessary condition for such agents to affect a 
change in their neighborhood map. In particular, we have 

T(t) = 0(m) (5-21) 
Since the swarm diameter D t (Q) is dominated by an exponentially decreasing function 
(See Corollary [l] to Proposition [9]), and inter-agent distance is bounded above by 
Dt(Q), we have: 

^(t) = 0{e- {v "' R ^) (5.22) 



The result then follows by standard algebra from Eq. (5.201. □ 



Proposition 10 shows that the implementable process ffl Vs (t,V{Q)) starts coin- 
ciding, at least in probability, to the ideal update process for small agent velocities. 
Note, that as v s 



+ in Eq. (5.19), we have, as expected: 



lim Prob 

v s ->0 



e (v 3 /R c )T c _ 1 



> 



O R r lim 



= 0(T C ) 



(5.23) 



>o v s 

which reflects the fact that while £% Vs (t, V(0)) takes 0(T C ) time to converge, we as- 
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(c) (d) 

Fig. 6.1: Convergence complexity for distributed optimization of the frozen network: 
(a) illustrates dependence on network size, (b) captures the 0(l/e) dependence. 
Convergence dynamics: (c) rapid convergence to large random target movements 
(d)robust response to large zero-mean variations in the failure probabilities 



sumed that J r Ve (t, M Va (t, 'P(O))) executes instantaneous route optimization. Similarly: 



lim Prob 



8M Vs (t, P(0)) - dJ v . (t, .% Vs (t,V(0))) 



> 



0(T C ) (5.24) 



which indicates that in the case where no topology changes occur due to the fact that 
all agents are neighbors of each other, the ideal process is faster for the same reason. 
If there is no communication, then no optimization is possible for either process: 



lim Prob 

H c ->0 



aa v .(t,v(p)) - ds v .{t,se v .(t,v(p))) 



> = 



(5.25) 



6. Simulation Results. 

6.1. Complexity of Optimization In The Frozen Swarm. Extensive simu- 
lations on NS2 network simulator are used to investigate how convergence times scale 
as a function of the network size (Figure 6.1 1. 10 2 random topologies were considered 



for each N (increased from 25 to 1600), and the mean times along with the max-min 
bars are plotted in Figure |6.1b| Note that the abscissa is on a logarithmic scale, and 
the near linear nature of the plot indicates a logarithmic dependence of the conver- 
gence on network size, implying that the bound computed in Proposition [6] is possibly 

(for N = 10 3 ) is hyperbolic, as 



6.1a 



not tight. The dependence on e shown in Figure 
expected, leading to a near linear dependence after a smoothing spline fit on a log-log 
scale. Convergence times are estimated from NS2 output (using 802.11 standard). 
Theoretical convergence results are illustrated in Figure |6~T| c-d) , generated on a 
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10 4 frozen agent network. Figure 6.1c illustrates the variation of the number of route 
updates (# of forwarding decision corrections) and the norm of the performance vector 
p p (scaled up by a multiplicative factor of 2) when the target is moved around ran- 
domly at a slower time scale. Since p p is the vector of end-to-end success probabilities 



(See Definition 14 1, its norm captures the degree of expected throughput across the 
network. Note that target changes induce self-organizing corrections, which rapidly 
die down, with the performance converging close to the global optimal (e = 0.001 was 
assumed in all the simulations). The failure probabilities are chosen randomly, and, 
on the average, held constant in the course of simulation illustrated in Figure |6.1c| 
(zero mean Gaussian noise is added to illustrate robustness). Note that the seemingly 
large fluctuations in the performance norm is unavoidable; the interval r is the what 
it approximately takes for information to percolate through the network, and hence 
this much time is necessary at a minimum for decentralized route convergence. Fig- 
ure |6.1d| illustrates the effect of large zero-mean stochastic variations in the failure 
probabilities. Each agent estimates the failure probabilities from simple windowed 
average of the link-specific packet failures. We note that large sustained fluctuations 
result in a sustained corrections in the forwarding decisions (which no longer goes to 
zero). However, the norm of the performance vector converges and holds steady. This 
clearly illustrates that the information percolation strategy induces a low-pass filter 
eliminating high-frequency fluctuations. A small number of route fluctuations always 
occur (note the non-zero number of corrections), but this does not induce significant 
performance variations. 

6.2. Simulation Studies On Mobile Swarms. First, we need to validate the 
theoretical development presented in Section[5] For that purpose, we consider a swarm 
of 10 4 agents, with a single target. The swarms are initially distributed uniformly over 
a 100m x lOOrn region. The failure probabilities are assumed to be linearly decreasing 
functions of the inter-agent distances. They also are given a random spatial depen- 
dence. The system is simulated in accordance to the described algorithms, at various 
values of the communication radius R c , and the agent velocity v s . Convergence time 
T CO nv is the time required for nearly all agents (> 99.9%) to reach the target. The 
results are shown in Figure |672) Note that Figure [6. 2a| validates Proposition [TU] in that 
the ideal process ^ Vg (t, V(0)) closely matches the implementable process M Vs (t, "P(O)) 
with respect to the fraction of agents reaching the goal as a function of the simula- 
tion time. For our simulated system the ideal process J ! Vs (t, "P(O)) was obtained by 
freezing the swarm for 1000 simulation ticks after every tick that caused any position 
updates. Since the frozen swarm is guaranteed to converge to the optimal routes (as 
shown in Section [4]), this procedure ensures that the achieved position updates closely 
approximate the ideal process. In logging simulation time, we ignored the time spent 
in the frozen optimization. Figure 6.2b||6.2d validates Corollary [2] by showing that 



for constant communication radius R c , the convergence time T conv does indeed vary 
hyperbolically with the v s (Figure 6.2b|6.2c|), an d for constant v s , it increases linearly 



with the communication radius R c (Figure |6. 2d ) . As the communication radius is de 



creased, we eventually encounter a point where the swarm begins to get disconnected, 
which is reflected in the rapid increase of T conv at low values of R c . If we identify R c 
with the amount of energy spent in communication, we note that there is an optimum 
value at which the T conv is minimized. The high-traffic paths generated in the swarm 
are very different for different communication radius. This is illustrated in Figure [673] 
(with 10 4 agents), where we note that for small R c we see the development of distinct 
paths. 
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Fig. 6.2: Simulation results validating theoretical development in Section |5j 
(&)J? Vs (t,'P(0)) and 2% Vb (t, "P(O)) matches closely w.r.t. the the fraction reaching tar- 
get, (b) variation of T conv with velocity v s at constant communication radius R c on 
a log-log scale showing the predicted hyperbolic relationship (Corollary [2]), (c) same 
data on linear scale, (d) variation of T conv with R c at constant v s showing the pre- 
dicted linear relationship. At low values of R c swarm begins to get disconnected 
resulting in rapid increase in T conv 



6.3. Simulation Case Studies. We present simulation results for a series of 
different scenarios. All simulations are done with a minimum of 10 agents, and the 
communication radius R c and the swarm velocity v s is kept fixed at 0.1m, and 2.5m/ s 
unless otherwise stated. The initial agent distribution is uniform, as before, over a 
100m x 100m plane (with the exception of intentional voids). 

1. Figure [6T4] illustrates the situation where there is a void in the initial agent 
distribution. We note how the high-traffic paths go around the void, resulting 
from the neighbor-following mechanism. This is intuitively correct, as the 
small communication radius implies that no information is available on the 
failure probabilities inside the void. As the simulation progresses, the two 
paths going around the void closes together, illustrating the fact as agents 
on the edge of the void incur inwards due to noisy position updates, the area 
with no agents diminishes. 

2. Figure 6.5 illustrates the scenario where we have two point targets. Note 
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Fig. 6.3: Effect of communication radius R c on swarm trajectories: Plates (a)-(d) 
illustrate the case for high R c = 3m, whereas plates (e)-(i) illustrate the case for 
R c = O.Ito. Note the case for low R c develops high-traffic paths in the simulation, 
whereas for high R c , the trajectories are more amorphous 



how the swarm splits automatically, and moves towards a particular target. 
This splitting decision is not taken a priori, but naturally emerges from the 
execution. 

3. Figure |6.7| illustrates a scenario with multiple extended targets. Note how 
the swarm trajectories form temporary accumulation points (can be seen 
towards the bottom of the target on the right), indicating the locations from 
where going towards both targets are equally advantageous. Random choice 
of best neighbors is sufficient, without any addition effort, to ensure that such 
accumulations are only temporary. 

4. Figure [6~3| illustrates the scenario with an obstacle, whose position needs to 
be locally sensed, and is not known a priori or globally. 

5. Figure |6.8| illustrates the case with agents that are not interested in going 
to target, but are willing to share information i.e. carry out the node-based 
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Fig. 6.4: Illustrating effect of void in the initial agent distribution: the trajectories 
tend to avoid the void since every agent is following a neighbor leading to distinct 
paths 



computations, and relate the measure values to the neighbors. The difficulty 
is that such agents are not necessarily moving towards the goal, so follow- 
ing such an agent may prove detrimental. However, as these agents move 
in a direction which is not improving the measure gradient, their nodal up- 
dates begin diminishing their current measures. The simulation shows that 
a sparse number of agents (in blue) which intend to reach the target can 
accomplish this in the presence of the former type (in red). Note also that 
in this simulation the number of blue agents was chosen to be insufficient to 
form a connected network. Thus the information percolation is shown to be 
sufficient for the development of distinct paths to target. 

7. Future Work. Future work will proceed in the following directions: 
1. Design explicit strategies for energy and congestion awareness within the 
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(g) (h) (i) 

Fig. 6.5: Multiple point targets: Note that the swarm automatically splits to move 
towards chosen targets; such choices are not made a priori but naturally emerge from 
the execution 



proposed framework. Note that each agent can regulate incoming traffic by 
deliberately reporting lower values of its current self-measure to its neighbors: 

Reported — > rjf' | . = C(?i> fy^e | <— Computed (7-1) 

where Vg; € Q,k £ [0, oo), k) € [0,1] is a multiplicative factor which 
is modulated to have decreasing values as local congestion increases. Such 
modulation forces automatic self-organization to compute alternate routes 
that tend to avoid the particular agent. The dynamics of such context-aware 
modulation may be non-trivial; while for slowly varying the conver- 

gence results presented here is expected to hold true, rapid fluctuations in 
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Fig. 6.6: Illustrating multiple extended target regions: Note that the swarm automat- 
ically splits to move towards chosen targets (in green); as before such choices are not 
made a priori but naturally emerge from the execution 



C{Qi,k) may be problematic. 

2. We assumed that the link-specific failure probabilities are estimated at the 
agents. Grossly incorrect estimations will translate to incorrect routing deci- 
sions, and decentralized strategies for robust identification of these parameters 
need to be investigated at a greater depth. More specifically, the proposed 
algorithm needs to be augmented with learning schemes (possibly reinforce- 
ment learning based approaches) that estimate such failure probabilities. 

3. Explicit design of implementation details such as packet headers, agent data 
structures and pertinent neighbor-neighbor communication protocols. 

4. Hardware validation on real systems. 
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Fig. 6.7: Presence of obstacles with obstacle positions are locally sensed and are not 
known a priori or globally. Note how the high traffic paths go around 
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